HDBSCAN clustering for identification of states with lignin dimers bound to \beta-cyclodextrin from molecular dynamics simulations

Author

Brian Novak

Introduction

In our previous study1, we investigated the interaction between \beta-cyclodextrin and lignin dimer derivatives through experimental methods, molecular dynamics simulations using GROMACS 2018.32,3, and docking with AutoDock Vina46. The relevant chemical structures are depicted in Figure 1. An introduction to the project7 and a summary of the publication8 are available. During the molecular dynamics simulations, we observed multiple types of dimer-cyclodextrin bound states. We estimated the proportions of these states in unbiased simulations using the following procedure:

  1. Computed a large number of collective variables including angle between lignin dimer and \beta-cyclodextrin principal axes, distances between atoms in the \beta-cyclodextrin molecule to atoms in the lignin dimer molecule, and lignin dihedral angles
  2. Applied principal component analysis (PCA)9 to reduce the number of dimensions to two
  3. Clustered the trajectory configurations using Density-Based Spatial Clustering of Applications with Noise (DBSCAN)10,11 to separate the different states into distinct clusters
  4. Counted the number of points in each cluster and computed proportions of each one

The procedure details can be found in the Supporting Information for the previous study1 on pages 9-18. The primary challenge was encountered in step 3, where the states were not distinctly separated. Trial and error was necessary to select DBSCAN parameters that successfully differentiated the clusters.

Figure 1: Structures of lignin dimer derivatives (1-3), and \beta-cyclodextrin. The lower right structures are 3D top and side views of \beta-cyclodextrin. The 3D representations show \beta-cyclodextrin from top and side views, with hydrogen atoms in white, carbon atoms in cyan, and oxygen atoms in red. The face of \beta-cyclodextrin with two hydroxyl (-OH) groups per unit (seen in the top view) is referred to as the secondary face, while the other face is referred to as the primary face. RDKit12 2023.03.3 was used for the 2D representations and VMD13 1.9.314 was used to create the 3D representations.
Code
"""
The files related to this figure are in ./000_Attachments/fig-dimers_BCD_labelled.
A conda environment file, bcd_ligin.yml, is in ./000_Attachments.
"""

from pathlib import Path
import subprocess
import numpy as np
from rdkit import Chem
from rdkit.Chem import Draw

import warnings

# Suppress UserWarning
warnings.filterwarnings("ignore", category=UserWarning)

# Generate dimer structures
smi_dir = "./000_Attachments/fig-dimers_BCD_labelled"
smi_list = [Path(smi_dir, smi) for smi in ['GG.smi', 'TGG.smi', 'GGBB.smi']]
rdkit_list = [Chem.MolFromSmiles(np.loadtxt(smi, dtype=str)[0]) for smi in smi_list]
dimers_img = Draw.MolsToGridImage(rdkit_list, molsPerRow=3, subImgSize=(400, 200))
outfile = Path(smi_dir, "dimers.png")
with open(outfile, "wb") as png:
    png.write(dimers_img.data)
subprocess.run(f"mogrify -trim {outfile}", shell=True) # ImageMagick

# Generate cyclodextrin 2D structure
smi = Path(smi_dir, "cyclodextrin.smi")
rdkit = Chem.MolFromSmiles(np.loadtxt(smi, dtype=str)[0])
outfile = Path(smi_dir, "BCD.png")
img = Draw.MolToFile(rdkit, outfile, size=(500, 500))
subprocess.run(f"mogrify -trim {outfile}", shell=True)

# A cyclodextrin PDB file, BCD.pdb, is included.
# VMD structures were created manually (top.tga, side.tga).
# Structures were combined into one image (dimers_BCD.png) manually

# Label the image with all structures
script_file = Path(smi_dir, "structures_labelling.sh")
subprocess.run(f"bash {script_file} {smi_dir}", shell=True);

In this study, three carefully selected collective variables were utilized to distinguish the observed states, eliminating the need for dimensionality reduction. Additionally, Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN)1517 was used instead of DBSCAN for clustering. Finding the point where the number of clusters as a function of the min_samples = min_cluster_size parameter for HDBSCAN started to never increase provided a way to identify a “natural” number of clusters (see Determination of min_samples for details on the algorithm). Additionally, this eliminated the trial and error that was previously required to separate the clusters using PCA and DBSCAN.1 The analysis uncovered configuration types with the lignin dimer bound to the cyclodextrin center that were not evident through mere visual inspection of the trajectories. However, these were subsets of the visually observed types, and only one cluster was dominant for each group of clusters corresponding to the visually observed types.

Methods

Collective variables

Three collective variables were defined based on important configurations seen in unbiased simulations. The collective variables were computed using PLUMED18 2.6.619.

Normal distance

The first collective variable was a signed distance from the cyclodextrin to the lignin dimer along the direction of a vector pointing from the cyclodextrin center through the secondary face.

d_n = \frac{\overrightarrow{CL} \cdot \overrightarrow{CS}}{\left|\overrightarrow{CS}\right|} \tag{1}

\overrightarrow{CL} is the vector pointing from the cyclodextrin center of mass (COM) to the lignin COM and \overrightarrow{CS} is the vector pointing from the cyclodextrin COM to the center of the hydroxyl oxygen atoms in the cyclodextrin secondary face.

Code
"""
The files related to this figure are in ./000_Attachments/fig-CS_vector.
A conda environment file, bcd_ligin.yml, is in ./000_Attachments.
"""

import MDAnalysis as mda
import py3Dmol

# PDB file name
bcd_pdb = "000_Attachments/fig-CS_vector/BCD.pdb"

# Load the PDB file
u = mda.Universe(bcd_pdb)

# Select the atoms in the cyclodextrin
cyclodextrin = u.select_atoms("resname BCDC")

# Calculate the COM of the cyclodextrin
com_cyclodextrin = cyclodextrin.center_of_mass()

# Select the oxygen atoms in the secondary face (O2 and O3)
oxygen_atoms = cyclodextrin.select_atoms("name O2 O3")

# Calculate the COM of the oxygen atoms
com_oxygen = oxygen_atoms.center_of_mass()

# Calculate the CS vector
cs_vector = com_oxygen - com_cyclodextrin

# Create a py3Dmol view
view = py3Dmol.view(width=800, height=600)

# Add the cyclodextrin molecule in CPK representation
view.addModel(open(bcd_pdb, "r").read(), "pdb")
view.setStyle({"model": 0}, {"sphere": {"scale": 0.3}, "stick": {}})

# Add the CS vector
view.addArrow(
    {
        "start": {"x": com_cyclodextrin[0], "y": com_cyclodextrin[1], "z": com_cyclodextrin[2]},
        "end": {"x": com_oxygen[0], "y": com_oxygen[1], "z": com_oxygen[2]},
        "radius": 0.2,
        "color": "green",
    }
)

# Label the CS vector with an offset
label_offset = [0.3, 0, 0]
view.addLabel(
    "CS",
    {
        "position": {
            "x": com_cyclodextrin[0] + cs_vector[0] + label_offset[0],
            "y": com_cyclodextrin[1] + cs_vector[1] + label_offset[1],
            "z": com_cyclodextrin[2] + cs_vector[2] + label_offset[2],
        },
        "backgroundOpacity": 0,
        "fontColor": "black",
        "fontSize": 20,
    },
)

# Set the background color and zoom to fit the molecule
view.setBackgroundColor("white")
view.zoomTo()

# Rotate the view to show the CS vector
view.rotate(-70, "x")

# Show the view
view.show()

3Dmol.js failed to load for some reason. Please check your browser console for error messages.

Figure 2: \overrightarrow{CS} represented as a green arrow. In the cyclodextrin structure, gray atoms are carbon, red atoms are oxygen, and hydrogen atoms are not shown.
Code
"""
The files related to this figure are in ./000_Attachments/fig-CL_vector.
A conda environment file, bcd_ligin.yml, is in ./000_Attachments.
"""

import numpy as np
import MDAnalysis as mda
import py3Dmol

# PDB file names
bcd_pdb = "000_Attachments/fig-CL_vector/BCD.pdb"
lignin_pdb = "000_Attachments/fig-CL_vector/GG.pdb"

# Load the PDB file for lignin
u_lignin = mda.Universe(lignin_pdb)

# Select the lignin molecule
lignin = u_lignin.select_atoms("all")

# Calculate the COM of the lignin molecule
com_lignin = lignin.center_of_mass()

# Load the PDB file for cyclodextrin
u_cyclodextrin = mda.Universe(bcd_pdb)

# Select the cyclodextrin molecule
cyclodextrin = u_cyclodextrin.select_atoms("all")

# Calculate the COM of the cyclodextrin molecule
com_cyclodextrin = cyclodextrin.center_of_mass()

# Calculate the CL vector
cl_vector = com_lignin - com_cyclodextrin

# Recalculate the CL vector
com_lignin = lignin.center_of_mass()
cl_vector = com_lignin - com_cyclodextrin

# Create a py3Dmol view
view = py3Dmol.view(width=800, height=600)

# Add the cyclodextrin molecule in CPK representation
view.addModel(open(bcd_pdb, "r").read(), "pdb")
view.setStyle({"model": 0}, {"sphere": {"scale": 0.3}, "stick": {}})

# Add the lignin molecule in CPK representation
view.addModel(open(lignin_pdb, "r").read(), "pdb")
view.setStyle({"model": 1}, {"sphere": {"scale": 0.3}, "stick": {}})

# Add the CL vector
view.addArrow(
    {
        "start": {"x": com_cyclodextrin[0], "y": com_cyclodextrin[1], "z": com_cyclodextrin[2]},
        "end": {"x": com_lignin[0], "y": com_lignin[1], "z": com_lignin[2]},
        "radius": 0.2,
        "color": "green",
    }
)

# Label the CL vector
label_offset = [-1.2, 0.4, 0]
view.addLabel(
    "CL",
    {
        "position": {
            "x": com_cyclodextrin[0] + label_offset[0],
            "y": com_cyclodextrin[1] + label_offset[1],
            "z": com_cyclodextrin[2] + label_offset[2],
        },
        "backgroundOpacity": 0,
        "fontColor": "black",
        "fontSize": 20,
    },
)

# Label the cyclodextrin molecule
label_offset = [0, 8.8, 0]
view.addLabel(
    "cyclodextrin",
    {
        "position": {
            "x": com_cyclodextrin[0] + label_offset[0],
            "y": com_cyclodextrin[1] + label_offset[1],
            "z": com_cyclodextrin[2] + label_offset[2],
        },
        "backgroundOpacity": 0,
        "fontColor": "black",
        "fontSize": 20,
    },
)

# Label the lignin molecule
label_offset = [0.5, 5, 0]
view.addLabel(
    "lignin dimer 1",
    {
        "position": {
            "x": com_lignin[0] + label_offset[0],
            "y": com_lignin[1] + label_offset[1],
            "z": com_lignin[2] + label_offset[2],
        },
        "backgroundOpacity": 0,
        "fontColor": "black",
        "fontSize": 20,
    },
)

# Set the background color and zoom to fit the models
view.setBackgroundColor("white")
view.zoomTo()

# Zoom in a bit
view.zoom(1.4)

# Show the view
view.show()

3Dmol.js failed to load for some reason. Please check your browser console for error messages.

Figure 3: \overrightarrow{CL} represented as a green arrow. In the molecular structures, gray atoms are carbon, red atoms are oxygen, and hydrogen atoms are not shown.
Code
"""
The files related to this figure are in ./000_Attachments/fig-dnorm.
A conda environment file, bcd_ligin.yml, is in ./000_Attachments.
"""

import py3Dmol
import numpy as np
import MDAnalysis as mda

# Load PDB files with mdanalysis
cyclodextrin_pdb = "000_Attachments/fig-dnorm/BCD.pdb"
lignin_pdb = "000_Attachments/fig-dnorm/GG.pdb"
cyclodextrin = mda.Universe(cyclodextrin_pdb)
lignin = mda.Universe(lignin_pdb)

# Calculate centers of mass
cyclodextrin_com = cyclodextrin.atoms.center_of_mass()
oxygen_com = cyclodextrin.select_atoms("name O2 O3").center_of_mass()
lignin_com = lignin.atoms.center_of_mass()

# Calculate vectors
CS_vector = oxygen_com - cyclodextrin_com
CL_vector = lignin_com - cyclodextrin_com

# Calculate dnorm
dnorm = np.dot(CS_vector, CL_vector) / np.linalg.norm(CS_vector)

# Create py3Dmol visualization
view = py3Dmol.view(width=800, height=600)

# Add the cyclodextrin molecule in CPK representation. Make it translucent.
view.addModel(open(cyclodextrin_pdb, "r").read(), "pdb")
view.setStyle({"model": 0}, {"sphere": {"scale": 0.3, "opacity": 0.5}, "stick": {"opacity": 0.5}})

# Add the lignin molecule in CPK representation
view.addModel(open(lignin_pdb, "r").read(), "pdb")
view.setStyle({"model": 1}, {"sphere": {"scale": 0.3}, "stick": {}})

# Add green cylinder to represent dnorm
start = cyclodextrin_com
end = cyclodextrin_com + dnorm * (CS_vector / np.linalg.norm(CS_vector))
view.addCylinder(
    {
        "start": {"x": start[0], "y": start[1], "z": start[2]},
        "end": {"x": end[0], "y": end[1], "z": end[2]},
        "radius": 0.2,
        "color": "green",
    }
)

# Label dnorm at the center of the cylinder with an offset.
# Use a white background, black text, and a font size of 20.
cylinder_center = (start + end) / 2
label_offset = [0.25, 0, 0]
view.addLabel(
    "dₙ",
    {
        "position": {
            "x": cylinder_center[0] + label_offset[0],
            "y": cylinder_center[1] + label_offset[1],
            "z": cylinder_center[2] + label_offset[2],
        },
        "backgroundOpacity": 0,
        "fontColor": "black",
        "fontSize": 20,
    },
)

# Label the cyclodextrin molecule
label_offset = [0, 0, 3.8]
view.addLabel(
    "cyclodextrin",
    {
        "position": {
            "x": cyclodextrin_com[0] + label_offset[0],
            "y": cyclodextrin_com[1] + label_offset[1],
            "z": cyclodextrin_com[2] + label_offset[2],
        },
        "backgroundOpacity": 0,
        "fontColor": "black",
        "fontSize": 20,
    },
)

# Label the lignin molecule
label_offset = [-1.65, 0, 3.9]
view.addLabel(
    "lignin dimer 1",
    {
        "position": {
            "x": lignin_com[0] + label_offset[0],
            "y": lignin_com[1] + label_offset[1],
            "z": lignin_com[2] + label_offset[2],
        },
        "backgroundOpacity": 0,
        "fontColor": "black",
        "fontSize": 20,
    },
)

# Set the background color and zoom to fit the models
view.setBackgroundColor("white")
view.zoomTo()

# Rotate the view to see the dnorm vector
view.rotate(90, "x")

# Zoom in a bit
view.zoom(1.1)

view.show()

3Dmol.js failed to load for some reason. Please check your browser console for error messages.

Figure 4: The signed distance from the cyclodextrin to the lignin dimer along \overrightarrow{CS}, d_n, shown as a green cylinder. In the molecular structures, gray atoms are carbon, red atoms are oxygen, cyclodextrin is translucent, and hydrogen atoms are not shown.

Tangential distance

The second collective variable was the distance from the cyclodextrin COM to the lignin dimer COM in the direction perpendicular to \overrightarrow{CS}. It is the length of the second leg of a right triangle with hypotenuse |\overrightarrow{CL}| and one leg d_n.

d_t = \sqrt{\left|\overrightarrow{CL}\right|^2 - d_n^2} \tag{2}

Code
"""
The files related to this figure are in ./000_Attachments/fig-dtang.
A conda environment file, bcd_ligin.yml, is in ./000_Attachments.
"""

import MDAnalysis as mda
import numpy as np
import py3Dmol

# PDB file paths
cyclodextrin_pdb = "000_Attachments/fig-dtang/BCD.pdb"
lignin_pdb = "000_Attachments/fig-dtang/GG.pdb"

# Load the PDB files
u_cyclodextrin = mda.Universe(cyclodextrin_pdb)
u_lignin = mda.Universe(lignin_pdb)

# Calculate the COM for cyclodextrin and lignin dimer
com_cyclodextrin = u_cyclodextrin.atoms.center_of_mass()
com_lignin = u_lignin.atoms.center_of_mass()

# Calculate the COM for oxygen atoms in cyclodextrin secondary face
oxygen_atoms = u_cyclodextrin.select_atoms("name O2 O3")
com_oxygen = oxygen_atoms.center_of_mass()

# Calculate the vectors
CS_vector = com_oxygen - com_cyclodextrin
CL_vector = com_lignin - com_cyclodextrin

# Calculate the projection of CL_vector onto the plane perpendicular to CS_vector
CS_unit_vector = CS_vector / np.linalg.norm(CS_vector)
projection = np.dot(CL_vector, CS_unit_vector) * CS_unit_vector
dtang_vector = CL_vector - projection
dtang = np.linalg.norm(dtang_vector)

# Create the py3Dmol visualization
view = py3Dmol.view()
view.addModel(open(cyclodextrin_pdb, "r").read(), "pdb")
view.addModel(open(lignin_pdb, "r").read(), "pdb")

# Set CPK representation for both molecules. Make the cyclodextrin translucent.
view.setStyle({"model": 0}, {"sphere": {"scale": 0.3, "opacity": 0.5}, "stick": {"opacity": 0.5}})
view.setStyle({"model": 1}, {"sphere": {"scale": 0.3}, "stick": {}})

# Add a green cylinder to represent dtang
start = com_cyclodextrin
end = com_cyclodextrin + dtang_vector
view.addCylinder(
    {
        "start": {"x": float(start[0]), "y": float(start[1]), "z": float(start[2])},
        "end": {"x": float(end[0]), "y": float(end[1]), "z": float(end[2])},
        "radius": 0.2,
        "color": "green",
    }
)

# Label dtang
label_offset = [-5, 0, 0.3]
view.addLabel(
    "dₜ",
    {
        "position": {
            "x": float(end[0] + label_offset[0]),
            "y": float(end[1] + label_offset[1]),
            "z": float(end[2] + label_offset[2]),
        },
        "backgroundColor": "white",
        "fontColor": "black",
        "fontSize": 20,
    },
)

# Label cyclodextrin
label_offset = [-4, 0, 3.8]
view.addLabel(
    "cyclodextrin",
    {
        "position": {
            "x": float(com_cyclodextrin[0] + label_offset[0]),
            "y": float(com_cyclodextrin[1] + label_offset[1]),
            "z": float(com_cyclodextrin[2] + label_offset[2]),
        },
        "backgroundOpacity": 0,
        "fontColor": "black",
        "fontSize": 20,
    },
)

# Label lignin
label_offset = [-1.7, 0, 3.8]
view.addLabel(
    "lignin dimer 1",
    {
        "position": {
            "x": float(com_lignin[0] + label_offset[0]),
            "y": float(com_lignin[1] + label_offset[1]),
            "z": float(com_lignin[2] + label_offset[2]),
        },
        "backgroundOpacity": 0,
        "fontColor": "black",
        "fontSize": 20,
    },
)

# Set the background color and zoom to fit the models
view.setBackgroundColor("white")
view.zoomTo()

# Rotate the view to show dtang
view.rotate(90, "x")

view.show()

3Dmol.js failed to load for some reason. Please check your browser console for error messages.

Figure 5: The distance from the cyclodextrin COM to the lignin dimer COM in the direction perpendicular to \overrightarrow{CS}, d_t, represented as a green cylinder. In the molecular structures, gray atoms are carbon, red atoms are oxygen, cyclodextrin is translucent, and hydrogen atoms are not shown.

Relative orientation

The third collective variable was the cosine of the angle between the vector from the center of the ring atoms in the lignin head to the center of the ring atoms in the lignin tail (\overrightarrow{HT}) and \overrightarrow{CS}.

\cos\theta = \frac{\overrightarrow{HT} \cdot \overrightarrow{CS}}{\left|\overrightarrow{HT}\right|\left|\overrightarrow{CS}\right|} \tag{3}

Code
"""
The files related to this figure are in ./000_Attachments/fig-HT_vector.
A conda environment file, bcd_ligin.yml, is in ./000_Attachments.
"""

import py3Dmol
import numpy as np
import MDAnalysis as mda

# Load PDB file with MDAnalysis
lignin_pdb = "000_Attachments/fig-HT_vector/GG.pdb"
lignin = mda.Universe(lignin_pdb)

# Lignin head and tail positions for HT vector
lignin_head = lignin.select_atoms("resid 1 and name C1 C2 C3 C4 C5 C6").center_of_mass()
lignin_tail = lignin.select_atoms("resid 2 and name C1 C2 C3 C4 C5 C6").center_of_mass()

# Create py3Dmol visualization
view = py3Dmol.view(width=800, height=600)

# Add the lignin molecule in CPK representation
view.addModel(open(lignin_pdb, "r").read(), "pdb")
view.setStyle({"model": 0}, {"sphere": {"scale": 0.3, "opacity": 0.5}, "stick": {"opacity": 0.5}})

# Add green arrow to represent HT vector
view.addArrow(
    {
        "start": {
            "x": float(lignin_head[0]),
            "y": float(lignin_head[1]),
            "z": float(lignin_head[2]),
        },
        "end": {
            "x": float(lignin_tail[0]),
            "y": float(lignin_tail[1]),
            "z": float(lignin_tail[2]),
        },
        "radius": 0.2,
        "color": "green",
    }
)

# Label HT
label_offset = [1, 0, 0]
view.addLabel(
    "HT",
    {
        "position": {
            "x": float(lignin_tail[0] + label_offset[0]),
            "y": float(lignin_tail[1] + label_offset[1]),
            "z": float(lignin_tail[2] + label_offset[2]),
        },
        "backgroundOpacity": 0,
        "fontColor": "black",
        "fontSize": 20,
    },
)

# Label lignin
lignin_com = lignin.atoms.center_of_mass()
label_offset = [-3, 0, 6]
view.addLabel(
    "lignin dimer 1",
    {
        "position": {
            "x": float(lignin_com[0] + label_offset[0]),
            "y": float(lignin_com[1] + label_offset[1]),
            "z": float(lignin_com[2] + label_offset[2]),
        },
        "backgroundOpacity": 0,
        "fontColor": "black",
        "fontSize": 20,
    },
)

# Set the background color and zoom to fit the models
view.setBackgroundColor("white")
view.zoomTo()

# Rotate for better view of the HT vector
view.rotate(-90, "x")
view.rotate(-30, "z")

# Show the view
view.show()

3Dmol.js failed to load for some reason. Please check your browser console for error messages.

Figure 6: The lignin head to tail vector, \overrightarrow{HT}, represented as a green arrow. In the translucent lignin structure, gray atoms are carbon, red atoms are oxygen, and hydrogen atoms are not shown.
Code
"""
The files related to this figure are in ./000_Attachments/fig-theta.
A conda environment file, bcd_ligin.yml, is in ./000_Attachments.
"""

import py3Dmol
import numpy as np
import MDAnalysis as mda

# Load PDB files with MDAnalysis
cyclodextrin_pdb = "000_Attachments/fig-theta/BCD.pdb"
lignin_pdb = "000_Attachments/fig-theta/GG.pdb"
cyclodextrin = mda.Universe(cyclodextrin_pdb)
lignin = mda.Universe(lignin_pdb)

# Calculate centers of mass
cyclodextrin_com = cyclodextrin.atoms.center_of_mass()
oxygen_com = cyclodextrin.select_atoms("name O2 O3").center_of_mass()

# Calculate CS vector
CS_vector = oxygen_com - cyclodextrin_com

# Calculate HT vector
lignin_head = lignin.select_atoms("resid 1 and name C1 C2 C3 C4 C5 C6").center_of_mass()
lignin_tail = lignin.select_atoms("resid 2 and name C1 C2 C3 C4 C5 C6").center_of_mass()
HT_vector = lignin_head - lignin_tail

# Shift HT vector to originate at cyclodextrin COM
HT_vector_shifted = HT_vector + cyclodextrin_com

# Calculate the cosine of the angle θ
cos_theta = np.dot(HT_vector, CS_vector) / (np.linalg.norm(HT_vector) * np.linalg.norm(CS_vector))
theta = np.arccos(cos_theta)

# Create py3Dmol visualization
view = py3Dmol.view(width=800, height=600)

# Add the cyclodextrin molecule in CPK representation
view.addModel(open(cyclodextrin_pdb, "r").read(), "pdb")
view.setStyle({"model": 0}, {"sphere": {"scale": 0.3, "opacity": 0.5}, "stick": {"opacity": 0.5}})

# Add the lignin molecule in CPK representation
view.addModel(open(lignin_pdb, "r").read(), "pdb")
view.setStyle({"model": 1}, {"sphere": {"scale": 0.3}, "stick": {}})

# Add green arrows to represent HT and CS vectors
view.addArrow(
    {
        "start": {
            "x": float(cyclodextrin_com[0]),
            "y": float(cyclodextrin_com[1]),
            "z": float(cyclodextrin_com[2]),
        },
        "end": {
            "x": float(HT_vector_shifted[0]),
            "y": float(HT_vector_shifted[1]),
            "z": float(HT_vector_shifted[2]),
        },
        "radius": 0.2,
        "color": "green",
    }
)
view.addArrow(
    {
        "start": {
            "x": float(cyclodextrin_com[0]),
            "y": float(cyclodextrin_com[1]),
            "z": float(cyclodextrin_com[2]),
        },
        "end": {"x": float(oxygen_com[0]), "y": float(oxygen_com[1]), "z": float(oxygen_com[2])},
        "radius": 0.2,
        "color": "green",
    }
)

# Label HT, CS, and θ
label_offset = [0, 0, 1]
view.addLabel(
    "HT",
    {
        "position": {
            "x": float(HT_vector_shifted[0] + label_offset[0]),
            "y": float(HT_vector_shifted[1] + label_offset[1]),
            "z": float(HT_vector_shifted[2] + label_offset[2]),
        },
        "backgroundOpacity": 0,
        "fontColor": "black",
        "fontSize": 20,
    },
)
label_offset = [-3, 0, 1]
view.addLabel(
    "CS",
    {
        "position": {
            "x": float(oxygen_com[0] + label_offset[0]),
            "y": float(oxygen_com[1] + label_offset[1]),
            "z": float(oxygen_com[2] + label_offset[2]),
        },
        "backgroundOpacity": 0,
        "fontColor": "black",
        "fontSize": 20,
    },
)
label_offset = [0, 0, 1.6]
view.addLabel(
    "θ",
    {
        "position": {
            "x": float(cyclodextrin_com[0] + label_offset[0]),
            "y": float(cyclodextrin_com[1] + label_offset[1]),
            "z": float(cyclodextrin_com[2] + label_offset[2]),
        },
        "backgroundOpacity": 0,
        "fontColor": "black",
        "fontSize": 20,
    },
)

# Label cyclodextrin
label_offset = [-4, 0, 5.8]
view.addLabel(
    "cyclodextrin",
    {
        "position": {
            "x": float(cyclodextrin_com[0] + label_offset[0]),
            "y": float(cyclodextrin_com[1] + label_offset[1]),
            "z": float(cyclodextrin_com[2] + label_offset[2]),
        },
        "backgroundOpacity": 0,
        "fontColor": "black",
        "fontSize": 20,
    },
)

# Label lignin
lignin_com = lignin.atoms.center_of_mass()
label_offset = [-9, 0, -1]
view.addLabel(
    "lignin dimer 1",
    {
        "position": {
            "x": float(lignin_com[0] + label_offset[0]),
            "y": float(lignin_com[1] + label_offset[1]),
            "z": float(lignin_com[2] + label_offset[2]),
        },
        "backgroundOpacity": 0,
        "fontColor": "black",
        "fontSize": 20,
    },
)

# Set the background color and zoom to fit the models
view.setBackgroundColor("white")
view.zoomTo()

# Rotate for better view of θ
view.rotate(-90, "x")
view.rotate(-30, "z")

# Show the view
view.show()

3Dmol.js failed to load for some reason. Please check your browser console for error messages.

Figure 7: The angle \theta between \overrightarrow{HT} and \overrightarrow{CS} represented as a green arrow. The origin of \overrightarrow{HT} is shifted to the cyclodextrin COM. In the molecular structures, gray atoms are carbon, red atoms are oxygen, cyclodextrin is translucent, and hydrogen atoms are not shown.

HDBSCAN clustering

Determination of min_samples

The best value of min_samples = min_cluster_size used in the HDBSCAN algorithm was determined using the following algorithm:

  1. Find the last value of min_samples, S, where the number of clusters corresponding to S+1 is larger than the number of clusters correponding to S. The number of clusters corresponding to S is C.
  2. Choose the smallest value of min_samples > S where the number of clusters is equal to C.

Results

Number of clusters

The numbers of clusters as a function of min_samples are shown in Figure 8, Figure 9, and Figure 10. The best values for min_samples and the corresponding numbers of clusters were determined using the algorithm described in Determination of min_samples.

Figure 8: Number of clusters for dimer 1 as a function of min_samples for the HDBSCAN algorithm. The best value for min_samples and the corresponding number of clusters is circled in red and indicated in the legend.
Code
"""
The files related to this figure are in ./000_Attachments/fig-nclusters_GG.
A singularity image files is ...
This figure was generated using snakemake in the input directory.
A conda environment file, bcd_ligin.yml, is in ./000_Attachments.
"""

import subprocess
import os
from pathlib import Path

# Current directory
current_dir = Path().resolve()

# Change directory
input_dir = Path("../../input").resolve()
os.chdir(input_dir)

# Run the snakemake command
subprocess.run(
    [
        "snakemake",
        "../writing/HDBSCAN_clustering_for_identification_of_states_with_lignin_dimers_bound_to_cyclodextrin_from_molecular_dynamics_simulations/000_Attachments/fig-nclusters_GG/min_samples.png",
        "-j",
    ],
    stdout=subprocess.DEVNULL,
    stderr=subprocess.DEVNULL,
)

# Change directory back
os.chdir(current_dir)
Figure 9: Number of clusters for dimer 2 as a function of min_samples for the HDBSCAN algorithm. The best value for min_samples and the corresponding number of clusters is circled in red and indicated in the legend.
Code
"""
The files related to this figure are in ./000_Attachments/fig-nclusters_TGG.
A singularity image files is ...
This figure was generated using snakemake in the input directory.
A conda environment file, bcd_ligin.yml, is in ./000_Attachments.
"""

import subprocess
import os
from pathlib import Path

# Current directory
current_dir = Path().resolve()

# Change directory
input_dir = Path("../../input").resolve()
os.chdir(input_dir)

# Run the snakemake command
subprocess.run(
    [
        "snakemake",
        "../writing/HDBSCAN_clustering_for_identification_of_states_with_lignin_dimers_bound_to_cyclodextrin_from_molecular_dynamics_simulations/000_Attachments/fig-nclusters_TGG/min_samples.png",
        "-j",
    ],
    stdout=subprocess.DEVNULL,
    stderr=subprocess.DEVNULL,
)

# Change directory back
os.chdir(current_dir)
Figure 10: Number of clusters for dimer 3 as a function of min_samples for the HDBSCAN algorithm. The best value for min_samples and the corresponding number of clusters is circled in red and indicated in the legend.
Code
"""
The files related to this figure are in ./000_Attachments/fig-nclusters_GG_BB.
A singularity image files is ...
This figure was generated using snakemake in the input directory.
A conda environment file, bcd_ligin.yml, is in ./000_Attachments.
"""

import subprocess
import os
from pathlib import Path

# Current directory
current_dir = Path().resolve()

# Change directory
input_dir = Path("../../input").resolve()
os.chdir(input_dir)

# Run the snakemake command
subprocess.run(
    [
        "snakemake",
        "../writing/HDBSCAN_clustering_for_identification_of_states_with_lignin_dimers_bound_to_cyclodextrin_from_molecular_dynamics_simulations/000_Attachments/fig-nclusters_GG_BB/min_samples.png",
        "-j",
    ],
    stdout=subprocess.DEVNULL,
    stderr=subprocess.DEVNULL,
)

# Change directory back
os.chdir(current_dir)

Cluster scatter plots

The 3D scatter plots of the collective variables showing the HDBSCAN clusters for each dimer are shown in Figure 11, Figure 12, and Figure 13.

Code
"""
The files related to this figure are in ./000_Attachments/fig-clusters_GG.
This figure was generated using snakemake in the input directory.
A conda environment file, bcd_ligin.yml, is in ./000_Attachments.
"""

import subprocess
import os
from pathlib import Path

# Current directory
current_dir = Path().resolve()

# Change directory
input_dir = Path("../../input").resolve()
os.chdir(input_dir)

# Run the snakemake command
subprocess.run(
    [
        "snakemake",
        "../writing/HDBSCAN_clustering_for_identification_of_states_with_lignin_dimers_bound_to_cyclodextrin_from_molecular_dynamics_simulations/000_Attachments/fig-clusters-GG/clusters_configs.png",
        "-j",
    ],
    stdout=subprocess.DEVNULL,
    stderr=subprocess.DEVNULL,
)

# Change directory back
os.chdir(current_dir)

# Interactive 3D scatter plot
import plotly.io as pio
import json

# Load the figure from the JSON file
with open("./000_Attachments/fig-clusters_GG/clusters.json", 'r') as f:
    fig_data = json.load(f)

# Create a figure from the loaded data
fig = pio.from_json(json.dumps(fig_data))

# Display the figure
fig.show()
Figure 11: Interactive and static 3D scatter plots of the collective variables showing the HDBSCAN clusters for dimer 1. Representative configurations for each cluster are shown around the static scatter plot except for the cyan and brown clusters which include configurations where the lignin dimer interacts (weakly) with the outside of the cyclodextrin ring or where there is no direct contact between the lignin dimer and cyclodextrin at all. The cyclodextrin atoms and bonds are gray except for the primary face oxygen atoms are green and the secondary face oxygen atoms are purple. Atom colors in the lignin dimer molecules: H=white, C=cyan, O=red. The arrows in the configurations point in the directions of \overrightarrow{CS} and \overrightarrow{HT}. HDBSCAN outlier points are not plotted.
Code
"""
The files related to this figure are in ./000_Attachments/fig-clusters_TGG.
This figure was generated using snakemake in the input directory.
A conda environment file, bcd_ligin.yml, is in ./000_Attachments.
"""

import subprocess
import os
from pathlib import Path

# Current directory
current_dir = Path().resolve()

# Change directory
input_dir = Path("../../input").resolve()
os.chdir(input_dir)

# Run the snakemake command
subprocess.run(
    [
        "snakemake",
        "../writing/HDBSCAN_clustering_for_identification_of_states_with_lignin_dimers_bound_to_cyclodextrin_from_molecular_dynamics_simulations/000_Attachments/fig-clusters_TGG/clusters_configs.png",
        "-j",
    ],
    stdout=subprocess.DEVNULL,
    stderr=subprocess.DEVNULL,
)

# Change directory back
os.chdir(current_dir)

# Interactive 3D scatter plot
import plotly.io as pio
import json

# Load the figure from the JSON file
with open("./000_Attachments/fig-clusters_TGG/clusters.json", 'r') as f:
    fig_data = json.load(f)

# Create a figure from the loaded data
fig = pio.from_json(json.dumps(fig_data))

# Display the figure
fig.show()
Figure 12: Interactive and static 3D scatter plots of the collective variables showing the HDBSCAN clusters for dimer 2. Representative configurations for each cluster are shown around the static scatter plot except for the orange cluster which includes configurations where the lignin dimer interacts (weakly) with the outside of the cyclodextrin ring or where there is no direct contact between the lignin dimer and cyclodextrin at all. The cyclodextrin atoms and bonds are gray except for the primary face oxygen atoms are green and the secondary face oxygen atoms are purple. Atom colors in the lignin dimer molecules: H=white, C=cyan, O=red. The arrows in the configurations point in the directions of \overrightarrow{CS} and \overrightarrow{HT}. HDBSCAN outlier points are not plotted.
Code
"""
The files related to this figure are in ./000_Attachments/fig-clusters_GG_BB.
This figure was generated using snakemake in the input directory.
A conda environment file, bcd_ligin.yml, is in ./000_Attachments.
"""

import subprocess
import os
from pathlib import Path

# Current directory
current_dir = Path().resolve()

# Change directory
input_dir = Path("../../input").resolve()
os.chdir(input_dir)

# Run the snakemake command
subprocess.run(
    [
        "snakemake",
        "../writing/HDBSCAN_clustering_for_identification_of_states_with_lignin_dimers_bound_to_cyclodextrin_from_molecular_dynamics_simulations/000_Attachments/fig-clusters-GG_BB/clusters_configs.png",
        "-j",
    ],
    stdout=subprocess.DEVNULL,
    stderr=subprocess.DEVNULL,
)

# Change directory back
os.chdir(current_dir)

# Interactive 3D scatter plot
import plotly.io as pio
import json

# Load the figure from the JSON file
with open("./000_Attachments/fig-clusters_GG_BB/clusters.json", 'r') as f:
    fig_data = json.load(f)

# Create a figure from the loaded data
fig = pio.from_json(json.dumps(fig_data))

# Display the figure
fig.show()
Figure 13: Interactive and static 3D scatter plots of the collective variables showing the HDBSCAN clusters for dimer 3. Representative configurations for each cluster are shown around the static scatter plot except for the purple cluster which includes configurations where the lignin dimer interacts (weakly) with the outside of the cyclodextrin ring or where there is no direct contact between the lignin dimer and cyclodextrin at all. The cyclodextrin atoms and bonds are gray except for the primary face oxygen atoms are green and the secondary face oxygen atoms are purple. Atom colors in the lignin dimer molecules: H=white, C=cyan, O=red. The arrows in the configurations point in the directions of \overrightarrow{CS} and \overrightarrow{HT}. HDBSCAN outlier points are not plotted.

Definition of cluster groups

For clusters containing configurations with a lignin dimer bound to the center of the cyclodextrin, clusters with similar binding configurations were grouped together. Those groups correspond to configurations that were visually observed in the trajectories. The cluster groups are described below in terms of the normal distances from cylodextrin COM to the lignin dimer COM, head, and tail. This is probably more intuitive than using a combination of d_norm and \cos \theta. The cyclodextrin center COM to lignin dimer COM normal distance is d_{norm} and the cyclodextrin COM to lignin dimer head and tail normal distances are defined analagous to d_{norm} in Equation 1.

Dimer 1 cluster groups

For dimer 1, the cluster groups were named head-sec, center-sec, and tail-sec. The head-sec group consists of clusters where the lignin dimer head distance is usually less than the lignin dimer tail and COM distances and the COM of the lignin dimer is closer to the cyclodextrin secondary face than the cyclodextrin primary face (positive distance). The center-sec group consists of clusters where the lignin dimer COM, head, and tail are about the same distance from the cyclodextrin COM and the COM of the lignin dimer is closer to the cyclodextrin secondary face than the cyclodextrin primary face (positive distance). The tail-sec group consists of clusters where the lignin dimer tail distance is usually less than the lignin dimer head and COM distances and the COM of the lignin dimer is closer to the cyclodextrin secondary face than the cyclodextrin primary face (positive distance). Kernel density estimate (KDE) plots of the cyclodextrin COM to lignin dimer normal distances (COM, head, tail) for each cluster group are shown in Figure 14, Figure 15, and Figure 16.

Code
"""
The files related to this figure are in ./000_Attachments/fig-kde_GG-head-sec.
This figure was generated using snakemake in the input directory.
A conda environment file, bcd_ligin.yml, is in ./000_Attachments.
"""

import subprocess
import os
from pathlib import Path

# Current directory
current_dir = Path().resolve()

# Change directory
input_dir = Path("../../input").resolve()
os.chdir(input_dir)

# Run the snakemake command
subprocess.run(
    [
        "snakemake",
        "../writing/HDBSCAN_clustering_for_identification_of_states_with_lignin_dimers_bound_to_cyclodextrin_from_molecular_dynamics_simulations/000_Attachments/fig-kde_GG-head-sec/cluster_distances_KDE_head-sec.png",
        "-j",
    ],
    stdout=subprocess.DEVNULL,
    stderr=subprocess.DEVNULL,
)

# Change directory back
os.chdir(current_dir)
Figure 14: KDE plots of the cyclodextrin to lignin dimer normal distances for dimer 1 for the head-sec cluster group. In the legend, COM refers to the cyclodextrin COM to lignin dimer COM distance, head refers to the cyclodextrin COM to lignin dimer head distance, and tail refers to the cyclodextrin COM to lignin dimer tail distance.
Code
"""
The files related to this figure are in ./000_Attachments/fig-kde_GG-center-sec.
This figure was generated using snakemake in the input directory.
A conda environment file, bcd_ligin.yml, is in ./000_Attachments.
"""

import subprocess
import os
from pathlib import Path

# Current directory
current_dir = Path().resolve()

# Change directory
input_dir = Path("../../input").resolve()
os.chdir(input_dir)

# Run the snakemake command
subprocess.run(
    [
        "snakemake",
        "../writing/HDBSCAN_clustering_for_identification_of_states_with_lignin_dimers_bound_to_cyclodextrin_from_molecular_dynamics_simulations/000_Attachments/fig-kde_GG-center-sec/cluster_distances_KDE_center-sec.png",
        "-j",
    ],
    stdout=subprocess.DEVNULL,
    stderr=subprocess.DEVNULL,
)

# Change directory back
os.chdir(current_dir)
Figure 15: KDE plots of the cyclodextrin to lignin dimer normal distances for dimer 1 for the center-sec cluster group. In the legend, COM refers to the cyclodextrin COM to lignin dimer COM distance, head refers to the cyclodextrin COM to lignin dimer head distance, and tail refers to the cyclodextrin COM to lignin dimer tail distance.
Code
"""
The files related to this figure are in ./000_Attachments/fig-kde_GG-tail-sec.
This figure was generated using snakemake in the input directory.
A conda environment file, bcd_ligin.yml, is in ./000_Attachments.
"""

import subprocess
import os
from pathlib import Path

# Current directory
current_dir = Path().resolve()

# Change directory
input_dir = Path("../../input").resolve()
os.chdir(input_dir)

# Run the snakemake command
subprocess.run(
    [
        "snakemake",
        "../writing/HDBSCAN_clustering_for_identification_of_states_with_lignin_dimers_bound_to_cyclodextrin_from_molecular_dynamics_simulations/000_Attachments/fig-kde_GG-tail-sec/cluster_distances_KDE_tail-sec.png",
        "-j",
    ],
    stdout=subprocess.DEVNULL,
    stderr=subprocess.DEVNULL,
)

# Change directory back
os.chdir(current_dir)
Figure 16: KDE plots of the cyclodextrin to lignin dimer normal distances for dimer 1 for the tail-sec cluster group. In the legend, COM refers to the cyclodextrin COM to lignin dimer COM distance, head refers to the cyclodextrin COM to lignin dimer head distance, and tail refers to the cyclodextrin COM to lignin dimer tail distance.

Dimer 2 cluster groups

For dimer 2, the cluster groups were named head-sec, center-sec, tail-sec, and tail-pri. The head-sec, center-sec, tail-sec groups were defined analogously to dimer 1. The tail-pri group consists of clusters where the lignin dimer tail distance is usually greater than the lignin dimer head and COM distances and the COM of the lignin dimer is closer to the cyclodextrin primary face than the cyclodextrin secondary face (negative distance). KDE plots of the cyclodextrin COM to lignin dimer normal distances (COM, head, tail) for each cluster group are shown in Figure 17, Figure 18, Figure 19, and Figure 20.

Code
"""
The files related to this figure are in ./000_Attachments/fig-kde_TGG-head-sec.
This figure was generated using snakemake in the input directory.
A conda environment file, bcd_ligin.yml, is in ./000_Attachments.
"""

import subprocess
import os
from pathlib import Path

# Current directory
current_dir = Path().resolve()

# Change directory
input_dir = Path("../../input").resolve()
os.chdir(input_dir)

# Run the snakemake command
subprocess.run(
    [
        "snakemake",
        "../writing/HDBSCAN_clustering_for_identification_of_states_with_lignin_dimers_bound_to_cyclodextrin_from_molecular_dynamics_simulations/000_Attachments/fig-kde_TGG-head-sec/cluster_distances_KDE_head-sec.png",
        "-j",
    ],
    stdout=subprocess.DEVNULL,
    stderr=subprocess.DEVNULL,
)

# Change directory back
os.chdir(current_dir)
Figure 17: KDE plots of the cyclodextrin to lignin dimer normal distances for dimer 2 for the head-sec cluster group. In the legend, COM refers to the cyclodextrin COM to lignin dimer COM distance, head refers to the cyclodextrin COM to lignin dimer head distance, and tail refers to the cyclodextrin COM to lignin dimer tail distance.
Code
"""
The files related to this figure are in ./000_Attachments/fig-kde_TGG-center-sec.
This figure was generated using snakemake in the input directory.
A conda environment file, bcd_ligin.yml, is in ./000_Attachments.
"""

import subprocess
import os
from pathlib import Path

# Current directory
current_dir = Path().resolve()

# Change directory
input_dir = Path("../../input").resolve()
os.chdir(input_dir)

# Run the snakemake command
subprocess.run(
    [
        "snakemake",
        "../writing/HDBSCAN_clustering_for_identification_of_states_with_lignin_dimers_bound_to_cyclodextrin_from_molecular_dynamics_simulations/000_Attachments/fig-kde_TGG-center-sec/cluster_distances_KDE_center-sec.png",
        "-j",
    ],
    stdout=subprocess.DEVNULL,
    stderr=subprocess.DEVNULL,
)

# Change directory back
os.chdir(current_dir)
Figure 18: KDE plots of the cyclodextrin to lignin dimer normal distances for dimer 2 for the center-sec cluster group. In the legend, COM refers to the cyclodextrin COM to lignin dimer COM distance, head refers to the cyclodextrin COM to lignin dimer head distance, and tail refers to the cyclodextrin COM to lignin dimer tail distance.
Code
"""
The files related to this figure are in ./000_Attachments/fig-kde_TGG-tail-sec.
This figure was generated using snakemake in the input directory.
A conda environment file, bcd_ligin.yml, is in ./000_Attachments.
"""

import subprocess
import os
from pathlib import Path

# Current directory
current_dir = Path().resolve()

# Change directory
input_dir = Path("../../input").resolve()
os.chdir(input_dir)

# Run the snakemake command
subprocess.run(
    [
        "snakemake",
        "../writing/HDBSCAN_clustering_for_identification_of_states_with_lignin_dimers_bound_to_cyclodextrin_from_molecular_dynamics_simulations/000_Attachments/fig-kde_TGG-tail-sec/cluster_distances_KDE_tail-sec.png",
        "-j",
    ],
    stdout=subprocess.DEVNULL,
    stderr=subprocess.DEVNULL,
)

# Change directory back
os.chdir(current_dir)
Figure 19: KDE plots of the cyclodextrin to lignin dimer normal distances for dimer 2 for the tail-sec cluster group. In the legend, COM refers to the cyclodextrin COM to lignin dimer COM distance, head refers to the cyclodextrin COM to lignin dimer head distance, and tail refers to the cyclodextrin COM to lignin dimer tail distance.
Code
"""
The files related to this figure are in ./000_Attachments/fig-kde_TGG-tail-pri.
This figure was generated using snakemake in the input directory.
A conda environment file, bcd_ligin.yml, is in ./000_Attachments.
"""

import subprocess
import os
from pathlib import Path

# Current directory
current_dir = Path().resolve()

# Change directory
input_dir = Path("../../input").resolve()
os.chdir(input_dir)

# Run the snakemake command
subprocess.run(
    [
        "snakemake",
        "../writing/HDBSCAN_clustering_for_identification_of_states_with_lignin_dimers_bound_to_cyclodextrin_from_molecular_dynamics_simulations/000_Attachments/fig-kde_TGG-tail-pri/cluster_distances_KDE_tail-pri.png",
        "-j",
    ],
    stdout=subprocess.DEVNULL,
    stderr=subprocess.DEVNULL,
)

# Change directory back
os.chdir(current_dir)
Figure 20: KDE plots of the cyclodextrin to lignin dimer normal distances for dimer 2 for the tail-pri cluster group. In the legend, COM refers to the cyclodextrin COM to lignin dimer COM distance, head refers to the cyclodextrin COM to lignin dimer head distance, and tail refers to the cyclodextrin COM to lignin dimer tail distance.

Dimer 3 cluster groups

For dimer 3, the cluster groups were named end-sec and center. The end-sec group consists of clusters where the distance for one of the lignin dimer ends (molecule is symmetric) is usually less than the distance for the lignin dimer end and the lignin dimer COMs and the COM of the lignin dimer is closer to the cyclodextrin secondary face than the cyclodextrin primary face (positive distance). The center group consists of clusters where the lignin dimer COM is located near the cyclodextrin. KDE plots of the cyclodextrin COM to lignin dimer normal distances (COM, ends) for each cluster group are shown in Figure 21 and Figure 22.

Code
"""
The files related to this figure are in ./000_Attachments/fig-kde_GG_BB-end-sec.
This figure was generated using snakemake in the input directory.
A conda environment file, bcd_ligin.yml, is in ./000_Attachments.
"""

import subprocess
import os
from pathlib import Path

# Current directory
current_dir = Path().resolve()

# Change directory
input_dir = Path("../../input").resolve()
os.chdir(input_dir)

# Run the snakemake command
subprocess.run(
    [
        "snakemake",
        "../writing/HDBSCAN_clustering_for_identification_of_states_with_lignin_dimers_bound_to_cyclodextrin_from_molecular_dynamics_simulations/000_Attachments/fig-kde_GG_BB-end-sec/cluster_distances_KDE_end-sec.png",
        "-j",
    ],
    stdout=subprocess.DEVNULL,
    stderr=subprocess.DEVNULL,
)

# Change directory back
os.chdir(current_dir)
Figure 21: KDE plots of the cyclodextrin to lignin dimer normal distances for dimer 3 for the center-sec cluster group. In the legend, COM refers to the cyclodextrin COM to lignin dimer COM distance, head refers to the cyclodextrin COM to lignin dimer head distance, and tail refers to the cyclodextrin COM to lignin dimer tail distance.
Code
"""
The files related to this figure are in ./000_Attachments/fig-kde_GG_BB-center.
This figure was generated using snakemake in the input directory.
A conda environment file, bcd_ligin.yml, is in ./000_Attachments.
"""

import subprocess
import os
from pathlib import Path

# Current directory
current_dir = Path().resolve()

# Change directory
input_dir = Path("../../input").resolve()
os.chdir(input_dir)

# Run the snakemake command
subprocess.run(
    [
        "snakemake",
        "../writing/HDBSCAN_clustering_for_identification_of_states_with_lignin_dimers_bound_to_cyclodextrin_from_molecular_dynamics_simulations/000_Attachments/fig-kde_GG_BB-center/cluster_distances_KDE_center.png",
        "-j",
    ],
    stdout=subprocess.DEVNULL,
    stderr=subprocess.DEVNULL,
)

# Change directory back
os.chdir(current_dir)
Figure 22: KDE plots of the cyclodextrin to lignin dimer normal distances for dimer 3 for the end-sec cluster group. In the legend, COM refers to the cyclodextrin COM to lignin dimer COM distance, head refers to the cyclodextrin COM to lignin dimer head distance, and tail refers to the cyclodextrin COM to lignin dimer tail distance.

Proportion of configurations in each cluster

For clusters and cluster groups containing configurations with a lignin dimer bound to the center of the cyclodextrin, the proportions of those configurations belonging to each cluster or cluster group were calculated. The proportions are shown in Table 1, Table 2, and Table 3.

Code
"""
The data for this table is in ./000_Attachments/tbl-fractions_GG/fractions.csv.
A conda environment file, bcd_ligin.yml, is in ./000_Attachments.
"""

from IPython.display import display, HTML
import pandas as pd

# Read CSV file into a DataFrame
df = pd.read_csv("./000_Attachments/tbl-fractions_GG/fractions.csv")

# Convert DataFrame to HTML without the index
html_table = df.to_html(index=False)

# Display the HTML table in the notebook
display(HTML(html_table))
Table 1: Proportions of center-bound configurations belonging to each cluster and cluster group for dimer 1. Refer to Figure 11 to see the clusters and configurations corresponding to the label numbers. The head-sec, center-sec, and tail-sec labels refer to cluster groups (see Dimer 1 cluster groups). The clusters in each cluster group are listed in the rows above them up until another cluster group label is encountered or the top of the table.
Label Fraction
4 0.0020
5 0.0060
6 0.5051
7 0.0054
8 0.0067
head:sec 0.5253
2 0.0571
center:sec 0.0571
3 0.4176
tail:sec 0.4176

There are 3 cluster groups defined for dimer 1: head-sec, center-sec, and tail-sec (see Dimer 1 cluster groups). The head-sec group consists of 5 clusters. However, most of the configurations belong to cluster 6. Clusters 4, 5, 7 and 8 account for a total of about 2.0% of all configurations, while cluster 6 has about 50.5% of all configurations. The center-sec and tail-sec groups each consist of a single cluster with about 5.7% and 41.8% of all configurations, respectively.

Code
"""
The data for this table is in ./000_Attachments/tbl-fractions_TGG/fractions.csv.
A conda environment file, bcd_ligin.yml, is in ./000_Attachments.
"""

from IPython.display import display, HTML
import pandas as pd

# Read CSV file into a DataFrame
df = pd.read_csv("./000_Attachments/tbl-fractions_TGG/fractions.csv")

# Convert DataFrame to HTML without the index
html_table = df.to_html(index=False)

# Display the HTML table in the notebook
display(HTML(html_table))
Table 2: Proportions of center-bound configurations belonging to each cluster and cluster group for dimer 2. Refer to Figure 12 to see the clusters and configurations corresponding to the label numbers. The head-sec, center-sec, tail-sec, and tail-pri labels refer to cluster groups (see Dimer 2 cluster groups). The clusters in each cluster group are listed in the rows above them up until another cluster group label is encountered or the top of the table.
Label Fraction
4 0.0229
5 0.7153
head:sec 0.7382
3 0.0280
center:sec 0.0280
2 0.2300
tail:sec 0.2300
1 0.0037
tail:pri 0.0037

There are 4 cluster groups defined for dimer 2: head-sec, center-sec, tail-sec, and tail-pri (see Dimer 2 cluster groups). The head-sec group consists of 2 clusters. However, most of the configurations belong to cluster 5. Cluster 4 accounts for about 2.3% of all configurations, while cluster 5 has about 71.5% of all configurations. The center-sec, tail-sec, and tail-pri groups consist of a single clusters with about 2.8%, 23.0%, and 0.4% of all configurations, respectively.

Code
"""
The data for this table is in ./000_Attachments/tbl-fractions_GG_BB/fractions.csv.
A conda environment file, bcd_ligin.yml, is in ./000_Attachments.
"""

from IPython.display import display, HTML
import pandas as pd

# Read CSV file into a DataFrame
df = pd.read_csv("./000_Attachments/tbl-fractions_GG_BB/fractions.csv")

# Convert DataFrame to HTML without the index
html_table = df.to_html(index=False)

# Display the HTML table in the notebook
display(HTML(html_table))
Table 3: Proportions of center-bound configurations belonging to each cluster and cluster group for dimer 3. Refer to Figure 13 to see the clusters and configurations corresponding to the label numbers. The end-sec and center labels refer to cluster groups (see Dimer 3 cluster groups). The clusters in each cluster group are listed in the rows above them up until another cluster group label is encountered or the top of the table.
Label Fraction
1 0.0717
3 0.0027
end:sec 0.0744
2 0.9256
center 0.9256

There are 2 cluster groups defined for dimer 3: end-sec and center (see Dimer 3 cluster groups). The end-sec group consists of 2 clusters. However, most of the configurations belong to cluster 1. Cluster 3 accounts for about 0.3% of all configurations, while cluster 5 has about 7.2% of all configurations. The center group consists of a single cluster with about 92.6% of all configurations.

Comparison to previous work

dd

Conclusions

sdf

Future work

fwe

References

(1)
Dean, K. R.; Novak, B.; Moradipour, M.; Tong, X.; Moldovan, D.; Knutson, B. L.; Rankin, S. E.; Lynn, B. C. Complexation of Lignin Dimers with β-Cyclodextrin and Binding Stability Analysis by ESI-MS, Isothermal Titration Calorimetry, and Molecular Dynamics Simulations. J. Phys. Chem. B 2022, 126 (8), 1655–1667. https://doi.org/10.1021/acs.jpcb.1c09190.
(2)
Kutzner, C.; Páll, S.; Fechner, M.; Esztermann, A.; de Groot, B. L.; Grubmüller, H. More Bang for Your Buck: Improved Use of GPU Nodes for GROMACS 2018. J. Comput. Chem. 2019, 40 (27), 2418–2431. https://doi.org/10.1002/jcc.26011.
(3)
Welcome to the GROMACS documentation! — GROMACS 2018.3 documentation. https://manual.gromacs.org/documentation/2018.3/index.html (accessed 2024-11-01).
(4)
Eberhardt, J.; Santos-Martins, D.; Tillack, A. F.; Forli, S. AutoDock Vina 1.2.0: New Docking Methods, Expanded Force Field, and Python Bindings. J. Chem. Inf. Model. 2021, 61 (8), 3891–3898. https://doi.org/10.1021/acs.jcim.1c00203.
(5)
Trott, O.; Olson, A. J. AutoDock Vina: Improving the Speed and Accuracy of Docking with a New Scoring Function, Efficient Optimization, and Multithreading. J. Comput. Chem. 2010, 31 (2), 455–461. https://doi.org/10.1002/jcc.21334.
(6)
AutoDock Vina. https://vina.scripps.edu/# (accessed 2024-11-02).
(7)
Lignin dimer derivatives interacting with β-cyclodextrin. https://brian-novak.vercel.app/career/website/lignin/lignin-dimer-derivatives-interacting-with-v-cyclodextrin/ (accessed 2024-11-02).
(8)
Complexation of Lignin Dimers with β-Cyclodextrin and Binding Stability Analysis by ESI-MS, Isothermal Titration Calorimetry, and Molecular Dynamics Simulations. https://brian-novak.vercel.app/career/website/lignin/complexation-of-lignin-dimers-with-v-cyclodextrin-and-binding-stability-analysis-by-esi-ms-isothermal-titration-calorimetry-and-molecular-dynamics-simulations/ (accessed 2024-11-02).
(9)
PCA — scikit-learn 1.5.2 documentation. https://scikit-learn/stable/modules/generated/sklearn.decomposition.PCA.html (accessed 2024-11-05).
(10)
DBSCAN — scikit-learn 1.5.2 documentation. https://scikit-learn.org/1.5/modules/generated/sklearn.cluster.DBSCAN.html (accessed 2024-11-05).
(11)
Ester, M.; Kriegel, H.-P.; Sander, J.; Xu, X. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining; KDD’96; AAAI Press: Portland, Oregon, 1996; pp 226–231.
(12)
RDKit. https://www.rdkit.org/ (accessed 2024-11-02).
(13)
Humphrey, W.; Dalke, A.; Schulten, K. VMD: Visual Molecular Dynamics. J. Mol. Graph. 1996, 14 (1), 33–38. https://doi.org/10.1016/0263-7855(96)00018-5.
(14)
VMD 1.9.3 Documentation. https://www.ks.uiuc.edu/Research/vmd/current/docs.html (accessed 2024-11-02).
(15)
Campello, R. J. G. B.; Moulavi, D.; Sander, J. Density-Based Clustering Based on Hierarchical Density Estimates. https://doi.org/10.1007/978-3-642-37456-2_14.
(16)
Campello, R. J. G. B.; Moulavi, D.; Zimek, A.; Sander, J. Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection. ACM Trans. Knowl. Discov. Data 2015, 10 (1), 5:1–5:51. https://doi.org/10.1145/2733381.
(17)
HDBSCAN — scikit-learn 1.5.2 documentation. https://scikit-learn/stable/modules/generated/sklearn.cluster.HDBSCAN.html (accessed 2024-11-05).
(18)
Tribello, G. A.; Bonomi, M.; Branduardi, D.; Camilloni, C.; Bussi, G. PLUMED 2: New Feathers for an Old Bird. Comput. Phys. Commun. 2014, 185 (2), 604–613. https://doi.org/10.1016/j.cpc.2013.09.018.
(19)
PLUMED: Introduction. https://www.plumed.org/doc-v2.6/user-doc/html/index.html (accessed 2024-11-02).